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£SJ , Abstract The Law of Large Numbers tells us that as the sample size (N) is increased, 

the sample mean converges on the population mean, provided that the latter exists. 
In this paper, we investigate the opposite effect: keeping the sample size fixed while 
increasing the number of outcomes (M) available to a discrete random variable. We 
establish sufficient conditions for the variance of the sample mean to increase mono- 
, <""| tonically with the number of outcomes, such that the sample mean "diverges" from 

the population mean, acting like an "reverse" to the law of large numbers. These re- 
sults, we believe, are relevant to many situations which require sampling of statistics 
of certain finite discrete random variables. 

Keywords Law of Large Numbers • Convergence and Divergence of Random 
■ Variables • Vandermonde matrix • Hypergeometric sums 
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ON ■ 1 Introduction 

cn 

C^) ' In probability theory it is customary to investigate two broad families of problems, 

(i) the convergence of sums of large numbers of random variables, and (ii) the esti- 
mation of likelihoods of deviations of those sums from their large number limits Q~]. 
The probability spaces of the random variables in question are usually fixed and one 
investigates how fast do properties of a sample drawn from the statistical population 
converge towards properties of that statistical population (to be called population for 
, brevity, later on). 

A good example of this is one of the most fundamental theorems of probability, the 
Law of Large Numbers (LLN). The weak version can be stated (see [2] for example) 
as : 
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Let X\,X2, ■ ■ ■ , -Xjv be an independent trials process, with finite expected 
value /j, — E[Xi] and variance o z = Var(Xi). Let S N = J2"=i *v Then V e > 0, 

lim P 

N^oo 

Noting that the quantity =M- is nothing more than the sample mean, Xjq, the Law 
of Large Numbers can be stated in words as "for independent trials of a sample of 
a distribution with finite population mean, the sample mean should approach the 
population mean, as the number of trials (i.e. size of the sample) gets very large". 

In this paper we pose questions in a different way. We consider a sequence of 
populations from discrete, finite probability spaces whose size (to be termed number 
of outcomes in what follows) increases. We draw a sample of a fixed size from each 
of these populations, and analyze under what conditions, and how quickly, do the 
properties of the samples diverge from properties of the populations. In effect we are 
increasing monotonically the number of outcomes available to each trial, while keeping 
the number of trials, of sample size (N) constant. In other words we formulate the 
"Reverse" of the Law of Large Numbers. 

We illustrate this using a thought experiment based on a real life situation - betting 
on horses. Each horse is given a unique label. Here the random variable in question is 
the label of the winning horse, the number of outcomes M is the number of horses in 
the race, and the sample size N stands for the number of repetitions of the race. If the 
number of horses in the race is fixed (M = const) then, if the number of race repetitions 
becomes large (N — ► oo), the average label of the winning horse converges towards that 
of the expected value of the label of the winning horse. We have made the idealized 
assumption that the probability distribution itself remains fixed thus ignoring aging 
effects, weather changes, different tracks etc. Now imagine that, as time progresses, 
new horses are added to the race indefinitely (M — > oo) and that each time a new 
horse has been added, the same number of races (N = const) is held. In this situation 
we intuitively perceive that the average label of the winning horse becomes less likely 
to be linked to the expected value of the label of the winning horse. In other words, 
the variance of the average label of the winner will increase at a rate that depends on 
the number of horses that participate in the race and on the probability distribution 
that a given horse wins. 

Another thought experiment to illustrate this idea is to think of an unbiased M- 
sided "die" . Here we ask what value appears face up when we throw the die. If we think 
of the standard die, with M = 6, and consider N = 1000 throws, we would expect the 
average throw to be pretty close to the expected value of, in this case, 3.5. However, if 
we increase M to larger and larger values, but keep TV fixed, we would no longer expect 
to always get sample means close to the population mean (expected value). 

In the above examples, we looked at one particular statistic - the sample mean. 
The Law of Large Numbers, in equation JTJ , tells us that the sample mean approaches 
the population mean as the number of trials increases. This is normally proven using 
Chebyshev's inequality (assuming that the variance, a 2 , is finite), which for the sample 
mean, is given by: 

P[\X N -n\>e]<^ (2) 

2 rr 2 

Here we used the fact that the variance of the sample mean is given by: a-^ = ^r. 
Chebyshev's inequality offers a sense of "how large" N must be to see the desired 
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convergence described by e. For fixed N and e, the variance of the random variable, X, 
controls the bound on the probability of the sample mean being close to the population 
mearQ. 

In this paper, we look at probability distributions of finite discrete random vari- 
ables. We consider what happens when additional outcomes are deemed possible, but 
the distribution still retains the same basic form. We then have, in essence, a sequence 
consisting of probability distributions that are all similar, but with higher terms in the 
sequence corresponding to those distributions with a higher number of outcomes (M). 
We also have corresponding sequences for the expected values, and variances of these 
distributions. We are interested in those such sequences of variances which diverge as 
the number of outcomes becomes large. Then, from equation ((2j), and for fixed N and 
e, it is clear that sample means for the corresponding random variables would not be 
likely to be close to the expected values of those random variables. 

Therefore, we seek conditions for these sequences of variances to diverge as the 
number of outcomes becomes very large. We also attempt to determine classes of 
(sequences of) distributions that correspond to particular rates of divergence of the 
variance. In the course of this work we will use the terms probability measure and 
probability distribution interchangeably. We will call the sequences of distributions 
described above, as simply distributions (that are functions of M), and the sequences 
of variances as variances (that are functions of M). We will attempt to formulate 
our considerations and results in the axiomatic language of Kolmogorov's theory of 
probability [3]. 



2 Theoretical formulation 

2.1 The Probability Distributions 

Both the weak and the strong LLNs are proven without imposing any assumptions on 
the sample space fi and on the probability measure P. Instead one only assumes the 
existence of the first moment, in the case of the weak law, and both the first and the 
second moments in the case of the strong law [4]. 

However, exploring the other extreme, namely the limit of the number of outcomes 
becoming very large (M oo) subject to the sample size N being fixed, does in fact 
require the knowledge of both the sample space and the probability measure. Hence, 
we have to formulate assumptions about f2 and P. 

Consider a finite discrete random variable X with (M + 1) possible outcomes fi = 
{m,j}jL , with associated probabilities {fj}jLo with M € N > 10 For our purposes 
it is necessary to define the probability distribution as an explicit function of the 
outcomes, rrij. Thus, we use a polynomial representation given by: 

P- 1 M 

P J : =^ = ^E a « m ^ forj = 0,...,M (3) 

n— 



1 Of course, this probability can never be greater than 1 so bounds exceeding 1 are "loose" . 

2 There are (M + 1) possible outcomes as we index from to M. For compactness we will 
continue to use M as the number of outcomes. Indeed, as M becomes very large, there is little 
difference between M and (M + 1). 
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Here the numbers (Pj) -_q are like "unnormalized" probabilities and the normalization 
factor, 91, is given by: 

M 



W:=£ P i (4) 



The coefficients, {an} n= Q, are real numbers which can be computed by inverting the 
linear relationship in <[3j) and thus inverting the matrix n = m j n - As J is the 
VanderMonde matrix [5] this can always be done; thus, all finite discrete probability 
distributions can be represented in this form. 

We hope to look at random variables with more arbitrary outcomes in future work 
but for this paper, we look at a specific set of outcomes, mj = j, j £ {0, 1, . . . M}. In 
this case, equation ([3]) becomes: 

p. , « 

^ : =^ = ^E^' n forj = 0,...,M (5) 

n=0 



3 ■ ' ) "reduced 
J / j=l 

probabilities". We can write the other coefficients {Sn}*Li m terms of the reduced 
probabilities: 



:=XX«^ (6) 



J 



for n — 1, . . . ,M. Here (%Lj,n)j^lj* n= i is the inverse of J := (j n ^ ' . Again, 

since J is a Vandermonde matrix it always invertible. 

We note that any quantities that describe the sample (the sample mean, the vari- 
ance of the sample mean, for example) depend explicitly on the coefficients in (JBJ). 
We will see that, depending on those coefficients, we will obtain different large-M be- 
haviour of those quantities and, in particular, of the variance of the sample mean. Some 
of these coefficients may be zero. For our analysis it is useful to use s > for the order 
of the polynomial in ([5} such that a s 7^ and aj = for all s < j < M. We will see 
that the variance of the sample mean scales differently with the number of outcomes 
depending on s. Both s and the coefficients a n may vary as M varies. 

Recall that the variance of the sample mean depends on the variance of the un- 
derlying random variable. To look at the behaviour of the variance for different M, we 
need to find a closed form expression for the inverse Vandermonde matrix. Further- 
more we believe that the closed form expression for the inverse will be useful for other 
mathematical problems, like polynomial least square fitting, Lagrange interpolation 
polynomials [B], and reconstruction of a statistical distribution from the moments of 
the distribution [7]. 

The expression for the inverse was, to the best of our knowledge, previously un- 
known. We give it below. The inverse (2lj in ) jl _l 1 . —1 reads: 

r-iv'+ n \ 
- < n }Jn,-„v ■ E Pi +P (M)(-nr (7) 



(n-l)\{M-n)\ 



p=0 



=m)^e( e rm( e n ^ 

,l<gi<— <q p <n-l 1=1 Hl ) \i>|K !j+ i<-< g ,-i<Mtp+l Hl j 
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Here the polynomials, Pi(j), satisfy the following recursion relations: 

3 

P M -j (M) - Pm~j (M - 1) = - £ JV-j+p (AO (- Af ) p (9) 

P =i 

for j = 1, . . . , M — 1 with Pj^(M) = 1. The lowest ten polynomials (j = 1, . . . , 10) are 
listed in (|101[) - (|110[) in Appendix C. Therein attached is also a piece of Mathematica 
code that tests the validity of those expressions. The proof of ([7|) is given in Appendix 
B. Note that rows with low (1, 2, 3) and with high (M, M — 1, M — 2) indices have 
a particularly simple form; the complexity of the expression rises when the row index 
tends towards M/2. Setting j = 1, 2, 3 in © and j = M, M - 1, M - 2, M - 3 in © 
we obtain: 

2ti,„ = (-1) 1+ "C^ (10) 



21 2 , „ = (-l) 1 ^ ^ J(li< 9 <„-i + 1„+i< 9 <a/)J (11) 

213, n = ( — l) 2 2ll,n ^j^'l'i+l<9l<92<A/ 
+ ^ ll<<ji<n-lln+l<(j 2 <A/ll<(ji< 9 2<n-l ) (I 2 ) 



9192 



2t M -3,n /(M-2)(Af-l)M 2 (M + l) 2 



(15) 



«M,„(-i)- 3 V 48 
-^(Af - 1)M(M + 1)(2 + 3M) + yM(M + 1) - n 3 ^j (13) 

_^^ , M - 1 )m,m ;1K2 + 3 m, _„ m , m + 1) + „, ) (u) 

gjf-ig = / M(M + 1) _ 
^(-l)- 1 V 2 

a - = (n-l)KM-n)! ( 16 ^ 

Note: The inversion of the Vandermonde matrix has applications in control theory [5J 
9,10 , in signal processing problems [11] and in systems theory [TO, 12. 13. 14 . In order 
to solve the problem one typically makes use of the Lagrange interpolation formula 
and finds the rows of the inverse matrix by computing the coefficients of the Lagrange 
interpolation polynomials related to the matrix elements [8]. An alternative approach 
was proposed in [15] where the inverse matrix is expressed as a product of two matrices 
one of which is diagonal and the elements of the other one are given through certain 
recursion relations. In this way the total number of operations needed to invert is 
reduced from 0(M 3 ) to 0(M 2 ). Finally in [TT] one expresses the elements of the 
inverse through totally symmetric polynomials of the elements of the original matrix. 

Our method of computing the inverse (presented in Appendix B) is a way of ac- 
tually solving the recursion relations from [15] or summing up the totally symmetric 
polynomials analytically if the elements of the original matrix are certain real or com- 
plex powers of a constant or of an arithmetic progression. Indeed, even though the 
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elements of the original matrix are first powers of an arithmetic progression in our 
work i.e. Xj = j, the manipulations (|97|l -(|99 | l can also be done analytically in the 
generic case of arbitrary powers, with little effort. Thus, we believe, that our method 
is superior to the methods known in the literature and can be applied to produce more 
efficient numerical algorithms of use in the areas described above. 

Now we fix N > 1 , we draw a sample of size N from the population described 
in ((5]) and we conjecture that the quantities that describe the sample (estimators of 
population parameters) diverge from those that describe the population, if the number 
of outcomes M becomes very large. To be specific, we analyze the variance of the 
sample mean, and we show that, except for very "unusual distributions", it increases 
monotonically with M, if M is big enough, which implies that, for fixed N, the sample 
mean diverges from the population mean when M — > oo. In most computations below 
we will use asymptotic limits (M — > oo) rather than exact results. However, deriving 
exact results is not an essential difficulty; it amounts to performing more work, which 
is not necessary for our purposes. 



2.2 The variance of the sample mean 



The variance of the sample mean reads: 



N 

E Xi 

i=l 

N 



a 
~N 



((x-(x)f) {x 2 )-(xy 



N 



N 



(17) 



It depends on the variance, a 2 , of the random variable. Using equation ((5} and Faul- 
haber's formula (|58p. this variance can be written as: 



2 1 




2M+4 

E !3 { n\M + iy 

n=0 

2M+2 

E [ n'(M + iy 

71=0 



(18) 



where 



,(2) 



22 ^n 1 <M^n 2 <Man 1 an 2 Bk 1 Bk 2 
n=rti +W2+4— (ki+k 2 ) 



712+3 



711 + 1-1 



fc 2 <7i 2 +3 Cfci" ' L ki<m+1 



712+2 



1 /tH1+2i 
J-fe 2 <7l 2 +2 O fei lfc 1 <,i 1 +2 



712+3 

E 

7i=7t i +712+2— (ki+k 2 ) 



ni + 1 ri2+2 

^n 1 <M^n 2 <MO'n 1 an 2 B kl B k2 



n x +2 



19) 



1 fei<ni+l 



ni + 1 



7l2 + l-| 

k 2 



L k 2 <n 2 + 1 



n 2 + 1 



(20) 



where n\,n2,kx,k2 £ N. The numbers, B ki , are the Bernoulli numbers. 

Thus both the variance a 2 and the variance of the sample mean are rational func- 
tions of the number of outcomes M. Since we are only interested in the large-M be- 
haviour of the variance of the sample mean, we do not need to simplify expressions (I19| l 
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and (|20[) for the coefficients but instead we only need to work out leading order terms 
in both the numerator and the denominator in (|18[) . Thus, we assume that the order 
of the polynomial in © is s > 0, meaning that the coefficent a s is non-zero and the 
last (M-s) coefficients, {5b}^L s +1 m the expansion ([5]), are zero. Then the coefficients 
for the highest order terms in the numerator read: 

fl« - (21) 

P-2S+4 - (s + 1)(s + 3)(s + 2 )2 W 
M) = 2a «-i a « ( a «) 2 /nnN 

p 2 S +3 s(s + 1 )( s + 2 )( s + 3) ( s + i)( s + 2 )0 + 3) 1 ' 

M) _ -a s a s -2 a s _ia s _i 3a s q s _ 2 , , 

p 2s+2 - s(s + 1 )2( s + 2 ) + s ( s+ l)2( s + 2 ) + ( s _i) s ( s + 2 )(s + 3) 1 ' 

a s a s _i2(3 + 2s) (a s ) 2 (2s + 3) 



s(s + f)(s + 2)(s + 3) (s + l)(s + 2)(s + 3) 



,(2) _ (5 S ) 2 



^2 = 7^2- (24) 



(2) _ 2a s _ia s _ (a s ) 2 
' 2s + 1 ~ s(s + 1) (s + 1) { ' 

Dividing the polynomials in (|f 9|l and (|20[) by one another we obtain the following 
expression for a 2 , for large M: 

- 2 - ( M + I) 2 , M^hv + (M + f ) f 2^=i - . j^r^ + °<W") 
(s + 3)(s + 2y \ a s ) s(s + 2) z (s + 3) 

Subbing this into equation (|f 7|1 will also give an expression for the variance of the 
sample mean as a function of M. 



2.3 The Large M behaviour of the Variance 



We now wish to examine whether a converges or diverges for large M. Equation 
can be rewritten as: 



2 

a = 



(- + 1) 



( S + 3)(s + 2) 2 



M 2 + 2M ( ° 



s J \ a s ! \s 



(27) 



As noted previously, it is possible for both the order, s, and the coefficients, a n , to 
vary with M. Looking at the term in the square brackets, we notice that the first term 
(M 2 ) will always dominate the third term (A/) and the second term (2M ( g |~ 1 \ (i)) 

will always dominate the fourth term (2 ( a | -1 ^ (j))- Therefore we want to consider 
the large M behaviour due to the first and second terms. The exact behaviour of the 
second term will depend on the distribution concerned, but we can identify four broad 
cases depending on the behaviour of the ratio ^ °g -1 J (j) in the large M limit: 



1. 


lim 


a s -i 




M— >oo 


a s 


2. 


lim 


3 S - 1 




M— >oo 


as 



- } <oo 

s / 

- ] 



« M 



s 



3 A ^(%r) (7) — ^ 2 m(^) ( i)>^ 

The first case corresponds to when the limit of the ratio is finite. The second case is 
when the ratio grows in absolute value with M but at a rate slower than M. The third 
case is when the ratio grows with M at a rate faster than M. Finally, the fourth case is 
when the ratio grows in absolute value with M but is negative. We now consider each 
case separately. 



2.3.1 Case 1 



In this case, we do not need to worry about the coefficients, and how they depend on 
M to understand the large M behaviour of the variance. It is clear that only the first 
term from equation (|27|l matters as M — > 00. Thus we have: 

a 2 w (f+1) m 2 (28) 
(s + 3)(s + 2) 2 v 1 

Now, let us consider three examples. Firstly, s does not depend on M; secondly, s ~ 



M, i.e. lim s/vM = 7 where < 7 < 00; and finally, s ~ M, i.e. lim s/M = 7 

M^oc M— >oo 



where < 7 < 00. From ([28} we see that in all three instances, both the variance and 
the variance of the sample mean (for given N) diverge as the second, the first and the 
zeroth power of the number of outcomes respectively. That is, we have: 

a 2 (s + 1) 

lim —-r = v . . ' , 9 < 00 (29) 

A/— >oo M 2 (s + 3)(s + 2) 2 v ' 

2 1 

lim — = — (30) 
M->oo M 1 7 2 

ct 2 1 

um TT7T = ^7 (31) 

M—oo M° 7 2 v y 

We now go on to provide three explicit examples of such distributions. In all three 
examples, for fixed sample size N, the sample mean, or the estimator of the population 
mean, diverges from the population mean, yet at different rates. We describe these 
examples as Accelerating Divergence, Divergence, and Decelerating Divergence, 
respectively. 



Accelerating divergence: We generated a sequence of non-zero real parameters (a n ) n —i 
and computed the normalized probabilities (Pj) -_ from ([5}. We run over s = 2, . . . , 6 
and for each value of s we plot both the normalized probabilities, for M — 100, as 
a function of j, and the variance of the distribution as a function of the number of 
outcomes M, in the double logarithmic scale (see Figure [l}. It is readily seen that the 
variance behaves asymptotically as the second power of the number of outcomes, and, 
in addition the asymptotic behaviour is attained quite quickly. 

Note: We reiterate that this type of divergence is obtained for every distribution whose 
unnormalized probability distribution can be represented by a polynomial of order s 
that does not depend on M, when M is large. 
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Fig. 1 The probability distributions in equation J5]l for different values of the order of the 
polynomial s = 2, . . . , 6 (left) and the variance of that distribution as a function of the number 
of outcomes M = 100000, . . . , 200000 (right). A second order polynomial was fitted by least 
square regression to each of the data sets on the right. (This was done to confirm the relation 
in equation |(28}). The parameters of that fit are (-3.2831 (-3.2834), 1.998, 0.000087), (-3.6240 
(-3.6243), 1.998, 0.000097), (-3.9195 (-3.9199), 1.998, 0.000138), (-4.17872 (-4.1795), 1.9959, 
0.00023), (-4.4096 (-4.4102), 1.99677, 0.00018) for s = 2, . . . , 6 respectively. The theoretical 
values of the first parameter (the intercept) are given in brackets and read log((s + l)/((s + 
3)(s + 2) 2 ). The theoretical values of the second and the third parameter are two and zero 
respectively. 



Divergence: Here we repeat the procedure from the previous point with one differ- 
ence, namely that the number of parameters depends on the number of outcomes like 



equation (|30f) . For the sake of simplicity let us take a n = S n 7v /jj where 7 > 0. Then, 
asymptotically, the unnormalized probabilities Pj, and the norm 91, read Pj = p^~^ , 
91 = ^-^==q- — respectively. The probability distribution is plotted in the first graph 
of Figure [2] The first and second moments, and the variance read: 

(Si = X> 7yl7 (^+1) = M ( 32) 



3=0 



7VM + 3, 

(j 2 ) - (j) 2 = M 2 ( + 1 \ = J_ M (34) 



. (7VM + 3) (7VM + 2) 2 / M ^00 Y 

In Figure [5] we plot the variance of the distribution as a function of the number of 
outcomes M. Now the variance diverges asymptotically as the first power of the number 
of outcomes. Here, however, the asymptotic behaviour is attained much slower than in 
the previous case. 

Note: We emphasize that this type of divergence is characteristic for every distribution 
whose probability function can be represented by a polynomial of order s that behaves 
like the square root of the number of outcomes M, for large M. Indeed, setting s = 
soVM in d27j) we get: 

a 2 M -L + 0(l) (35) 

M^oo Sq 

Decelerating Divergence: Finally, we consider an exponential distribution 
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Fig. 2 The probability distribution Fj = (left) and the variance of that distribution as 

a function of the number of outcomes M = 100000, 500000 (right) Here, the parameters of the 
linear regression second order polynomial fit are (-0.3±0.0027, 1.04±0.0004, 0.002±0.000017) 
compared to the theoretical parameters (0.0, 1.0,0.0), 



for j = 0, . . . , M and a > 0. For large M, the distribution can be well approximated by 
its Taylor expansion truncated at the M th order and as such, it would correspond to 
the polynomial form in equation ([5]). Then a n — {—a) n /n\ and, in the large M limit, 
the variance takes the following form: 

j2 _ 1 ( l-eM-«M)(l + aM+^ + ^) \ J_ 

q 2 1 l-exp(-aM)(l + aM) Ji/-*oo a 2 ^ ' 



Recall that equation (|37[) is only an approximation. In order to check the validity of that 
approximation, we plot the approximated variance of the sample mean a\: = a 2 IN 

along with the exact result in Figure [3] We can see from the figure that the results 
do not differ by more than 1% for M > 10. 

Now the variance behaves asymptotically as the zeroth power of the number of 
outcomes and this behaviour is attained much faster than in the previous case. 
Note 1: Again we stress that this type of divergence is characteristic for every distri- 
bution whose probability function is a polynomial whose order s is proportional to the 
number of outcomes M. Indeed, setting s = sm in (f26|) we get: 



° 2 = -K + °(tt) (38) 



From the above discussion it is clear that the condition: 

(^] < oo (39) 

is sufficient for the variance to exhibit divergent behaviour in the large M limit. It 
is however an open question to specify both necessary and sufficient conditions 
for distributions to exhibit the behaviour specified above. One way of doing that, is 
to require that the last (M — s) coefficients (5 n )*f_ s+1 are equal to zero. This, from 



3 We have derived the exact variance of the sample mean using MATHEMATICA. The 
expression is lengthy and cumbersome and, in our opinion, it does not bring much to quote it 
here. 



lim 

M— >oo 



(is 
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Fig. 3 On the left: The approximated variance (short dash) versus the exact variance (long 
dash) of an exponential distribution as a function of the number of outcomes for different values 
of a = 1.0, 1.5, 2.0 (from top to bottom). On the right: The difference of the approximated and 
the exact variance divided by the exact variance as a function of the number of outcomes. 



(|6]), yields (M — s) linearly independent linear equations for the unknown probabilities, 
given by: 

an = J2 %" ■ = (40) 
3=1 3 

for n = s + 1, s + 2, . . . , M. The solution to (gOj) is given in ffTT9]l along with (fT27)l and 
its proof is in Appendix D. 



Summary of Case 1 An s-parameter family of unnormalized distributions is given by: 

s 

P j =Po + ^ ( 0? J -(P 9 -Po) (41) 

9=1 q 

for j — s + 1, . . . ,M with the quantities being defined in (|127p . A probability 

distribution satisfying (|41|l and the constraint (|39|l has a property that its variance a 2 , 
along with the variance of the sample mean cr- 2 ^ (for a given N) , exhibits an asymptotic 
behaviour as in (|27|l. This implies that the large-M behaviour of the variance of the 
sample mean can range in a continuous fashion from a quadratic divergence (o\ — 

A TV 

M 2 ) to asymptotic convergence (a\ ~ M°), for fixed sample size, N. Furthermore, 
if we assume the order of the polynomial in J5J, s, has a power law dependence on M 
for large M, of the form s ~ M a , with < a < 1, then, from ([27]), and for fixed TV the 
variance of the sample mean behaves as a% ~ M 2_2a . 

We did some numerical testing of our results. For particular values of M = 50 
and s = 10 we have found the solutions (|127p by solving numerically equations Q40[). 
Subsequently we found the unnormalized probabilities from (|41|) by taking arbitrarily 
P = and P 9 = (-1) 9 for g = 1, . . . , s. We plot the results in Figureg] 

In addition, for a particular value of s (s = 10), and for M = 12, . . . , 50, we 
computed numerically the variance of all the s different probability distributions in 
(|41[) and we plotted the results, as a function of M in Figure [4] as well. As we can 
see, the variance displays a quadratic dependence on the number of outcomes and the 
leading order coefficient fits in well with the result in (I27|l . 
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Fig. 4 On the left: All probability distributions for case 1. Here M = 50 and s = 10. We plot 
r ■ \ M 

the quantities ( QJq.j - I for q = 1, . . . , s and the distributions arc constructed as in l|41[l . 

On the right: The dependence of the variance of the probability distributions shown in the 
left of Figure [4] on the number of outcomes. Here s = 10 is fixed and M = s + 2, . . . , 50. The 
graphs are overlayed with a polynomial of second order fitted by least square regression. . The 
leading order coefficients read 0.00603, 0.00606, 0.00607, 0.00609, 0.00609, 0.00608, 0.00607, 
0.00601, 0.00593, and 0.00587 compared to the theoretical value 0.00587 (compare equation 
(25}). 



2.3.2 Case 2 

Recalling section [2T3T case 2 includes all probability distributions that satisfy the fol- 
lowing conditions: 

lim 2»zl (I) = oo (42) 
M— >oo a s \s J 

2M (-) <. M 2 (43) 

a s \s I 



For this case, looking at equation (|27[) it is clear that only the first term is important 
for large M. Thus one can use equation (|28|l in the previous section and follow the 
discussion of divergence there. Thus case 2 distributions will also exhibit divergence 
in the variance as M — * oo. This implies, as before, that, for fixed N, the large-M 
behaviour of the variance of the sample mean can range in a continuous fashion from 
a quadratic divergence (a^ ~ M 2 ) to asymptotic convergence {a\ ~ M°). 

2.3.3 Case 3 

Case 3 distributions are those that satisfy: 

l im ( *£±) (T\ = oc (44) 



a-s-i \ / 1 



2M I ) ( - ) > M 2 (45) 



Unlike in previous cases, it is clear that now the second term in equation (|27p will 
contribute to the variance in the large M limit. However, the variance and the variance 
of the sample mean (for fixed N) will still diverge as M — ► oo. But if the inequality in 
(|45l) is strict, then the second term in equation (|27[1 now controls the exact behaviour 



of the divergence and it will be faster divergence than the maximum possible (M 2 ) in 
the previous two cases. However, unlike in the previous cases, we do not analyze this 
behaviour any further. 
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2.3.4 Case 4 

In the previous three cases, for a given N, the variance of the sample mean increases 
with the number of outcomes, which might imply that this holds for every distribution. 
However this is not the case. One can construct distributions such that the variance of 
the sample mean decreases with the number of outcomes. 

For example, take the (unnormalized) probability distriubtion given by: 

P,-=i M2 (46) 

Although this suggests a polynomial of order M 2 , for each M, it can always be de- 
composed in a unique way into a polynomial of the form in ([5}. In this case the 
leading order term in the decomposition corresponds to s = M. Asymptotically, we get 
<yi = M m2 + 1 /(M 2 + 1) and the variance reads: 

2 , a ( 1 + M 2 



a — M ^ (3 + M 2 )(2 + M 2 ) 2 ) < 47 > 

Hence the variance is inversely proportional to the square of the number of outcomes 
M, for large M. Thus we have convergent behaviour for the variance. 

If we now solve for the ratio a | -1 in equation (|27|l by requiring that the variance 
behaves like M~ 2 for large M, then we get: 



-M 5 - 2M 4 + 7AI 2 + 16M + 12^ 
(M(M 2 + 2M + 1) 



(48) 



This equation combined with the fact that s — M confirms that the probability 
distribution in ([46 [I fits under case 4 as its satisfies the conditions given by: 

l im .__(°S=±) (I) =-00 (49) 
,2 



M^oo \ a s J \s 



2M (¥) u) ^ - M ' z (5o) 

It is also possible to show that all distributions of the form Fj = j AI where 
x > 1, like that in equation (|46p . will have a variance that converges as M — > 00. 
These distributions are, however, quite unusual since most probabilities are assigned 
to very few outcomes. It is still an open question as to how many distributions will 
have variances that will exhibit convergent behaviour for a large number of outcomes. 



2.4 The Percentiles 

So far we used the variance as a measure of the "width" of the distribution. This may be 
unsatisfactory because for certain distributions the variance may not carry the relevant 
information - an example is the case where the variance does not exist. It is thus much 

more useful to compute percentiles of the distribution of the difference between the 

N 

sample and the population mean, Z := (Y] Xj)/N — fi, and investigate the conditions 

i=l 

under which they diverge. Here /i := (X) is the mean of the random variable X. Recall 
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that the p-percentile z p of random variable Z is denned as such a value of argument 
such that the Cumulative Distribution Function (CDF) equals p%, i.e. 

p{z>Zp)= wo (51) 

M 

Here < p < 100. We denote by 01 := ~}2 Pj the normalization constant and we 

0=0 

compute the CDF as follows: 

M N 

M ( N \ NM ( N \ 

= ^v e ns e e n^^<M (53) 



ni,...,n w =0 \p=l ^ / l=(z+fi)N n \P=1 / 

E jj>=i 
p=i 

m n a «p 

= ^ E (gW((™) l " l+ "-(^") l " l+ ") <«> 

ni,...,n.iv=0 

1 1 ^ (iVM^-^ + ^iv)^ gjy 
- gtiv (jv-1)! ( P + iV)p! p 1 J 

MTV jv_i MN p 

E (ib^E^f < 56 > 

;=(2+M)A r p=o 

In l)52p we used the definition of the CDF and the Law of Conditional Probabilities, 
and we conditioned on the random variables in the sample, while we used equation ((5JI 
in (|53l) . In (|54|) we neglected the indicator functions in the last term in parentheses, 
which we can do for large M, and we summed over the j values using the identity (1591 1 

and we summed over I using elementary number theoretic identities. We also defined 

N 

|n| := n P as the L 1 norm of the n vector. In (|55|) we introduced the normalized 
p=l 

A^ th auto- convolution af N of the coefficients (a n )*Lo v ^ a: 



N-l+p 



E 



i' 



E ' 

9 = 1 



N 

n< 

9=1 



(57) 



Note: Expression f|56p uses an approximate identity (|59[) and is therefore only a start- 
ing point for analyzing the scaling of the percentiles of the difference between the 
sample mean and the population mean. In order to provide further insight into this 
problem one needs to compute the -/V th autoconvolution from (|57|) , (O and from the ex- 
pression © for the inverse Vandermonde matrix. This will make it possible to uniquely 
classify distributions according to the asymptotic behaviour of their percentiles. We 
will accomplish that goal in future work. 



15 



3 Conclusions 

All finite discrete probability distributions may be represented by polynomials in the 
possible outcomes of their random variables, by inverting the relevant VanderMonde 
matrix. We used this fact to examine the convergence of certain sample properties of 
finite discrete distributions. In particular, we considered families of probability distri- 
butions that had different numbers of outcomes (M) but were otherwise similar. Such 
families of distributions can be represented as sequences whose terms are indexed by 
M. Their variances can also be described as sequences in M. 

In this paper we considered the special case of integer outcomes {0, 1, . . . M}. We 
calculated an expression for the variances of such families of distributions using large M 
approximations, and then examined their behaviour as M —> oo. We found conditions 
for divergence of the variance with increasing M and gave some examples. We discussed 
some necessary and sufficient conditions for distributions to exhibit specific types of 
divergence. We also gave examples of families of distributions that did not satisfy these 
conditions for divergence and whose variances actually converged as M — > oo. Finally, 
we have provided an expression for the Cumulative Distribution Function (CDF) of the 
difference between the sample mean and the population mean. This expression can be 
used to analyze the asymptotic behaviour of the percentiles of the difference between 
the sample mean and the population mean as a function of the number of outcomes 
M. 

In obtaining the results, we derived many mathematical identities and they are 
listed in the appendices. We also derived an expression for the inverse of the Vander- 
monde matrix used in this paper, which was previously unknown. We believe these 
formulae could be useful in many different fields. 

We hope, in future work, to generalize our results to arbitrary probability distribu- 
tions, to quantify better what distributions these results apply to, as well as examine 
the behaviour of the variances of these distributions, for an increasing number of out- 
comes, relative to the mean. 

In conclusion, this work should be relevant to situations where one must sample the 
distributions of finite discrete random variables. It highlights how adding new outcomes 
to such random variables (without any change in the overall form of the probability 
distribution) may still affect how the statistics converge, with increasing sample size, 
to the population properties. More importantly, we have proven that for certain finite 
discrete probability distributions when N is kept fixed, the variance of the sample mean 
actually diverges as M — > oo. We termed this result as the "Law of Many Outcomes", 
or alternatively, the "Reverse of the Law of Large Numbers". 



Appendix A 

In this appendix, we list certain identities used in the main body of the paper. The identities 
158I I- H7U all relate to sums of powers of integers over certain sets. The identities J72P - (182 ft 
involve sums of binomial coefficients. Then the remainder of this appendix is a generalization 
of the standard proof of the Vandermonde determinant. This is used in Appendix B to calculate 
the inverse of the Vandermonde matrix used in this paper. 

In the following we denote by (-Bfe)^L the Bernoulli numbers. 
Faulhaber's Formula: 

M n ID 

j=0 k—0 v ' 
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Sum Over Simplex I: 



JV 

jv n v 

2^ 11* -« (| n | +JV -i)! (M) 

N p=l vl 1 ' 

E 
P =i 

JV 

Here |n| := 23 n p . The result is valid for big values of n only. 
p=l 

Sum Over Simplex II: 

s N B +s 

E X\ip P = £(» + irc« (60) 

0<ji <.. .<J S <n p= 1 m— 1 

3 . n — 1 

Here JVj := E n p i s the L -norm of the sequence (n p )p_ 1 , the symbol (a)( n ) := n (a — p) 
p— 1 p— 

is the Pochhammer symbol, and the coefficients Cj, read: 



n (N q + q) 
9=1 



(61) 



:= t . % (62) 

9=1 



c (s) ■= V" 



Wl+jl)(2) B 2 

^ n (w«+9-2-w) 2! 

9=1 



E — ("n") (63) 

1<J1<J3<« 11 (-^g + 9-l 9 >ii -l«>ia) ' 

9 = 1 



^■■= e i n^f «*) 

l<ii<-<3jv s+s -m<s fl (Np +P- £ 1 P >j:) P =1 P ' 
p=i ;=i 

for m = 1, . . . , iVs + s. Here the numbers dp are multiplicities of elements of the sequence 
{jp)p—i, i-e. such numbers that the sequence (jp)p =1 falls into I groups composed of equal 
elements, such that the first group has length d\, the second group length cfe, up to the I th 
group who has length di. 

In particular when rij = 1 and thus Nj = j for j = 1, . . . , s then we have: 

1<J1 <. ..<32s — m P=l p— 1 ^ 

We were able to resum the series for particular values of m, using identity l|76|l . and we 
give the result below: 

c w = _•_ 

2a (2s)!! 
( ) 1 A (2. + 1)11 

2s " 1 (2s -1)!! V3(2s-2)!! 



(2s - 2)!! V3 2 v 2! 
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(2s - 1)! 



(2s ■ 



3)!! (2s - 
1 2 



1)! 



1/2 
3 \45 
B 2 



7 
135' 
1 



4 

27' 



LI' 



2! 



Bi + -(2s + l) 



(67) 



The identity (J60J is proven by performing the sums over the consecutive indices j a , . . . ,ji 
by means of the Faulhaber's formula. 
Sum Over Simplex III: 



s N B +s N s 

E IK* = E E cW^tt-a+ir."' (68) 

a<ji < .. .<j s <b p— 1 mi— 0m2— 

where the coefficients C^' are defined in H64II . The identity H68I I follows from substituting 
j„ = j q — a for q = 1, . . . , s, from expanding the resulting power terms in binomial expansions, 
from applying (1606 to sum over the sequences (j q ^ an d from the identity H82I I, 
Sum Over Simplex IV: 

s L : = E n^;=r 1 =cj=ii: i, '" , (- i ) a+1 " Ic l+i 1 -« ( 69 ) 

q<jq-l<-.-<jo<jl=0 1=1 

Here j—i = j and jq = q and j — q > q+ 1. The series Il69|l can be rcsummed by means of the 
binomial expansion formula. 

As a simple corollary from J69D we have: 
Sum Over Simplex V: 



E n^r^oj 



5 9 .- 

3,3 

8+Pl<ig-l<--<i0<3-P2 i = 



V c 9+ f 1 " 1 c j 7 P2 - 1 c-'- 1 ,zf i zr 9 - (pi+P2) i? 2+1 (-i)«"+ 3 -( ! i+ i =+ i 3) ; af i f 2 ,j;7o) 

9-1 9+P1-2 J-P2— 2 12 3 v i ^h,h,ii ' 

i 1 =i,...,p 1 

l2=l,-- ,J-9-(pi+P2) 
l 3 = l,...,p 2 +l 

where 

aPl,P2 fr\ V"^ ^.9-1-91+1^91-92 + 1^92 + 1-, 1 i 

P h,h,h (q> '~ 2—1 h 12 h 1 92<P2 1 91-92<i-9-(pi+P2) 1 9-91<Pl-l 

0<92<91<9 

(71) 

Here the second bit on the right hand side in H7UH stands for the sum over ordered sequences 
bounded from below and above by q and j respectively and not bounded from below and above 
by q + Pi and j — p 2 . Here j — q > pi + p 2 + q + 1. 
A Binomial Identity: 

Let M > s £ N and p £ Z and p > — 1. Then we have: 

Ms 

E c*+i- 1 (j + s y+ 1 
j=i 

- c " r+i («■ + »<» — » g „_,„„': :;:,;?;, +,„+. ) <-> 

= - s,x) (73) 
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Note that for p — — 1 the term in parentheses on the right hand side equals unity. The quantities 

P+l 

f P (n, x) = (x + 1) Aq n q are polynomials of order p + 1 in the variable n with coefficients 
9 =o 

A p q +1 = A p q +1 (s) that depend on s. Following recursion relations hold 

(n-1) , , + ,„.i / s(a; + 2) + l x- + 1 \ 00 

f p (n,x) = -f p (n-l,x)+ y ' (s + n) p+1 = ( 1. v y h —^—n, ... I 

^ ^ (x + n) Jp ^ ' (z + n)^ ; ^ (2 + l)(z + 2) z + 2 / p= _ 1 

(74) 

for the polynomials and 

A p q t\l q >2+xA p +H q < p+1 = Af+ 1 C l + 1 (-iy+ 1 -<' + C%+ 1 s'>+ 1 -n q < p+1 (75) 

l=q-l 

and for the coefficients of the polynomials. Here q = p + 1, p, . . . , 0. 

Below we enclose a Mathematica script that test the identities I I72H and H73I I: 
(* Testing the binomial identity (59) *) 

s=l;p=6;x=7; 

Table[Sum[ Binomial[ x+j-1 ,x] (j+s) ~{} (p+D , {j , 1 ,M-s}] ,{M,2 ,20}] 

Table [Binomial [M-s+x, x+1] Sum[(M-s-l) ! / (q-1) ! (x+q-1) !/(x+M-s) ! (s+q) "{}(p+D , {q, 1 ,M-s>] (x+1) ,{M,2,20}] 

(* Testing the binomial identity (60) *) 
x=. ;n=. ; s= . ;p=3; 

ll=CoefficientList[ (x+n)Sum[A[q] n"{}q,{q, ,p+l}] - (n-1) Sum [A [q] (n-1) ~{}q,{q,0,p+l}] - (s+n)~{}(p+l) ,{n}] ; 
ss =Simplify [Solve [Table [ ll[[j]] == , {j , 1 ,Length[ll] }] .Table [A [q] , {q, ,p+l}] ] ] ; 
MyCoeffs =Table [ A[q] ,{q,0,p+l}] /. ss[[l]] 

MyF[n_,x_,s_] := Sum[ MyCoef f s [ [i] ] n"{}(i-l) ,{i , 1 , Length [MyCoef f s] }] ; 
s=4 ; x=7 ; 

Table [Sum [Binomial [ x+j-l,x] (j+s) ~{}(p+l) ,{j ,i,M-s}] ,{M,5,30}]- 
Table [Binomial [M-s+x, x+1] (x+1) MyF[M-s,x,s] ,{M,5,30}] 

A Hypergeometric Identity: 

Let a £ N and a q G R for q = 0, . . . , N. Then we have: 

£ W + aa-pil f = 1 ( 2s + 2a + l) !! » 

^ (2j-2)V. ^ q 2a + 3 (2s - 2)!! ^ 9 

J=l v u ' q—0 v 1 q — \) 

where the coefficients (A p )^_ satisfy the following system of equations: 

JV 

((2a + l) + 2(p + l))A p -2 ]T C q+1 (~l) q+1 - p A q = (2a + 3)a p (77) 

q=p+l 

for p = N, N — 1, . . . , 0. The system of equations always has a unique solution. The identity 
II76II is proven by elementary methods, i.e. by reducing the sum to a common denominator and 
factoring out the numerator, ft would be interesting to know if an analoguous identity exists 
in the case when the ratio of double factorials is replaced by a product of one or several ratios 
of that kind or of similar ones. 
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Corollary: Define: 

d 



©«(,):= Y TT ( 2 ^-P) !! ( 7 8) 

K I ,<...< JJ <» P =l Wp IP+^J" 



and 

d 



l<3K...<J d < S p=l ( IJV W +!))■■ 

for i = 1 , . . . , d — 1 . We term the quantities in 178 > and in (1 79 I t the sum over the bulk and over 
the border of a d dimensional simplex, respectively. 
Then the sums over the bulk of the simplex read: 



6< 3 >(s 
6< 7 >(s 



i s ( s -i)(4 s + i) 

77^0 - 1)0 - 2) (a - 3)(80s 3 - 120s 2 + 7s + 12) 



2296350 

1 (2s -1)!! 
405 (2s -4)!! 



(s - 2)(s - 3)(s - 4)(s - 5)(2240s 5 - 14000s 4 + 27580s 3 - 17815s 2 + 159s + 1566) 
(s - 2) (20s 2 - 5s - 3) 



1 (2s -3) ' 4 o , 

-, '—(s - 3)(s - 4)(112s 4 - 392s 3 + 329s 2 - 7s - 30) 

51030 (2s - 6)!P A A y 

1 (2s -5)!! 



6889050 (2s - 8)!! 

(s - 4)(s - 5)(s - 6)(-756 - 15s + 8894s 2 - 16035s 3 + 10820s 4 - 3120s 5 + 320s 6 ) (80) 
The sums over the border of the simplex read: 

6 (3.i) w= ;^-i;;;if_^_^ 

S C3,2)( S ) = 



(2s - 4)!! 5 V 21 21 
630 

© (4,2) 00 = -rlr( s " " 2 )( 2s " 3 )( 2s " !)( 2s + !) 

105 

S(4 ' 3)(S) = 9o" ( * " " 2)S(16 " 2 " 17S ~ 3) (81) 

The identities 11801 and 11811 follow from an iterative application of the identity H76I I . The sums 
are performed starting from the sum over the index with the lowest subscript and ending at the 
sum over the index with the biggest subscript. It will be interesting to find a generic formula 
related to a simplex of arbitrary dimension. In future work we will use the above results to 
conjecture the generic formula and prove that formula by induction. 
The Generalized Chu-Vandermonde Identity: 

e n<#-, p - 1 =<r lOP (82) 

0<q 1 <...<q s _ 1 <bp=l 

for go = and q a = b. The above identity is proven by an iterative application of the Chu- 
Vandermonde identity 1161 . Since the Chu-Vandermonde identity is closely related to the 
Gauss's hypergeometric theorem |17l . to DougalPs Formula 1181 . to Thomae's theorem 1191 
and to various other identities that involve generalized hypergeometric functions 1201 . it would 
be interesting to derive the many-dimensional analogues for those identities using our methods. 
Generalized Vandermonde Determinant: 
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Let M G N and 1 < a < M and 1 < Ji < . . . < J a < M with Jq = and J a +i = M, and 

3 . 
<5g 6 N for 9 = 1, . . . , a. In addition define := ^2 Sg, and xj^ := (xj) . and 



(4) - e n <*. 

a<ii<...<3i<bg=l 



(83) 



Then we have: 



( j— l 



dot 



E 5 eij>j e +i 



>j'=i,i 



(xj — xi) I ■ y 1 det 

Af>j>l / ?! 5„=0 



3-2+ E (*«!;(> j e +i+(€fl-«e)ii=j e +i) 



»,i=a. 



II fe-^) ■ E 4*" 4 'det 
Af>j>i / 5e®g =1 [o,i e ] 

n ^ - x i)} ■ e 

M>j>i>l 



I J-^ jb i (Ze 1 j>j e +( s e-Ze) 1 j>.i fl +i 1 j g+1 - J g >2) 
x p+i 



e n< 



The sum on the right-hand-side in H87H contains f ~[ (<5q. + 1) terms. Here the parameters 

2=1 9j =1 

(aP +1 , J p+1 ,<5»+ 1 )^l~ 1 with |JP +1 | = |<5P +1 | = a p+1 for p = 0,...,M - 1 constitute a 
branching process and satisfy following recursion relations: 



,p+i 



jp+1 



a p + > 1 ,p _ jp >9 
t—i J e+i J e- 
e=i 

I \ 

J p - 1 J p - \ T p - 1 7 P 7 P - 1 T p 



/,= t 



/ 

CP CP cP xP _ cP cP xP _ cP 

«1> ■ ' ■ '?Ai'?Ai+l'°Ai+l ?A 1 +l'---'? J 4 1 +A 2 ' Ai+A 2 ^Aj+Aa 



for p = 1, . . . , M - 1. The constraint, X] (^9 + 2j 4q+l) = a p+1 , holds where A q counts the 

9 = 1 

number of consecutive adjacent elements in the sequence J p and j4 9 _|_i counts the number of 
consecutive non-adjacent elements in the sequence J p . Here q = 1, ... ,ui. 

Proof: We denote by Q( n ' (xi , x^) := x i x 2 a totally symmetric polynomial 



of order n in the variables x\ and X2 and by nj := j — 2 + ^2 Sg^-j> J g +i f° r j = 2, . . . , M 

the power index of the matrix element. Now, we multiply the first row of the determinant by 
minus unity and add to all following rows, i.e. to the second, the third, up to the M row, we 
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factor out a term (x p — si) from the p th row, and, from the multi-linearity of the determinant, 
we get: 

det = ( n ( x p - xi ) I det (Q {nj) ( x i, x p)) M ' M (89) 

In the next step we modify the j th column, for j = 2, . . . , M. We multiply the first column by 
(— the second column by (~x" j ), and so on and so forth, up to the Ji th column by 
(—x" j Jl +1 ) and add them all to the j th column. It is not hard to see that we get: 

n.; —l. 



det ( ^2 x l p x" J ) 
U=Ji-l 



9.J'=2,2 



det 1,^ ^- J , X) E 4 M " Jl x? 1 (90) 



Zl — Jl — 1 ^M — J 1 — Jl — 1 



A/ 



p=2 



Here in the first J\ — 1 columns (whose labels are 2, . . . , Ji) we have single powers terms, 
whereas in the last M — J\ columns (whose labels are Ji + 1 , . . . , M) we have sums of nj — Ji + 2 
power terms. Thus at positions j ' = Jg + 1 for = 1, . . . , a the number of terms in the sums 
increases by <5i , ^2 + 1, . . . , 5 a + l whereas at all remaining positions, with j > Ji + 1, the number 
of terms in the sums increases by one. Now we expand the determinant and we deduce, from 
the multi-linearity and antisymmetric property of the determinant that the sums at positions 
j = Jg + 1 can run over the whole range of index values, whereas the sums at the remaining 
positions pick up only maximal values of the summation index. Otherwise in the determinants 
that result from the expansion there are at least two columns that arc proportional to each 
other, hence the determinants are zero. In other words we have: 

h e {Ji - 1,..., Ji - l + <5i},Z 2 = Ji +5i,...,0 2 -Ji = J2-2 + <5i (91) 
h 2 -j x +i e {J2 -l + Si,..., J 2 - l + A 2 },lj 2 ~ Jl+ 2 =Ja + A a ,..., 

lj 3 -j 2 = J 3 -2 + A 2 (92) 



'Ja-Ja-I+I 6 i J °- ~ 1 + A.-1,. • ■ , Ja - 1 + A a } ,lj a -J a _ 1+2 = Ja+A a ,..., 

Im-j, = M-2 + A a (93) 

From this follows the result (185 1> - The final result 187t follows from iterating (186 D . q.e.d. 
Example 1: Take a = 1 and S\ = 1. Then we have: 

j-l+l i > J _ r \M.M 



p,i=i>i 



Sl detf4- 1+1 ^-'+ 1 ) M ' M_1 + detf4- 1+1 ^ J ) M '^ 1 
V ^ 'p,J=2,l V ^ J p, j=2,l 



= (94) 



n ~ 

M>j>1 

\ M-J 

11 '-, ' e n **« ( 95 ) 

M>j>i>l y l<»i<...<ijvf_ j<M 9=1 

The last step H95H is proven by mathematical induction in M, for example. 
Example 2: Take a = 1 and <5i > 0. Then we have: 

det (*r 1+4il '-' +i ) ' . = n ^ - *o 

\Af>3>i>l 

M - 1 M - 1 M 

M E E «|)-( E ff ) 

>; >; >: n<; 5=1 9=1 (96) 

q=l 
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The result Il96jl follows from l|87| l along with the recursion relations {88) . 



Appendix B 

Here we prove formula J7J for the inverse of the Vandermonde matrix (2tj,«)^-Q ^=1 = [(i" _1 )j'„-- 

= ( ' r det ((p + l p > i )«- 1+1 «>»+i ) (97) 

l<p<9<M 

/(_l)i+n+l n (q-p + l,>n-l p >n)\ 

I l<y<?<M-l 

fl (g~^p) x 

\ l<p< 9 <M / 

M-l-j 

e n (v+ i %>o ( 98 ) 

l<i 1 <...<i M _ 1 _ 3 <M— 1 9=1 



(n - 1)!(M - n)! 



^"^(M-l) (99) 



In I I97H we expressed the inverse matrix as the transposed matrix of algebraic complements 
and in H98H we used identity II95I I to compute the algebraic complements. In H99I I we evaluated 
the sum of the product in a recursive way as follows: 

M-l-j M-l 
S^-W(M-l) := J2 EI (<9 + !*,>«)= E S M - 2 ~Hl -W + h> n ) 

l<il <...<ij\/_!_ j <M— 1 q = l l — l 

M-l-j \ 

£ ^+i+p(A/)(-nH (100) 
p=0 / 

with S°(M— 1) = 1. The last equality has been obtained by computing the sums S M ~ 1 ~ J (M — 
1) recursively for j = M — 1, M — 2, M — 3, . . .. Here we noticed that the result is an order- 
j polynomial in the column number n, with the coefficient at the p th power of n being a 
polynomials in M of order 2j — 2p. The later polynomials satisfy recursion relations l|9j, which 
we verified using Mathematica using the piece of code provided in Appendix C. Changing the 
variables j = j + I and j = j yields expression J7J . 

j 

Now, we prove formula {§}. In 11981 we write i q = q + ^ ^q>q — (p— l) f° r 9 = 1, • ■ • j Af — 

p=l _ p 

1 — j and for some 1 < qi < (j2 < ■ ■ • < Qj < M — 1. We split the product under the sum 
into a product of (j + 1) products such that, in the p th product, the index q runs within the 
range q = q p — (p — 1), . . . , %>+i — (p + 1) for p = 0, . . . , j with <?j+i = M — 1. Then we change 

variables q p = q + p for q = q p — (p — 1), . . . , q p +i — (p + 1). Then we assume that the column 
index n satisfies the following inequality q p + l < n < This implies that in the first (p — 1) 

products (from left to right), all the indicator functions equal zero, in the p th product some 
indicator functions are zero and the others are one, and finally, in the last (p + 1) products 
all the indicator functions equal unity. This allows us to factor out the term M\/n from the 
product and absorb it into the prefactor in J98D thus producing the binomial coefficient. The 
remaining sum over the values of the q indices is left unevaluated. It is possible to obtain the 
large M limit of that sum easily. 



Appendix C 



Here we give closed form expressions for certain polynomials that are used to invert the Van- 
dermonde matrix. The expressions have been obtained by solving the recursion relations in 
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g}. Define P M -j{M) := P M -j(M)/ ( 'u (M — p) 

Pm-i(M) = i (101) 

P M -2{M) = i(3M + 2) (102) 

Pm-s(M) = -^—M(M + 1) (103) 
4o 

(15M 3 + 15M 2 - 10M - 8) 

P M -i{M) = ± '- (104) 

v ' 5760 

M(M + 1) (3M 2 — M — 6) 

P M - 5 (M) = — i- (105 

; 11520 v ; 

(63M 5 - 315M 3 - 224M 2 + 140M + 96) 

P M -s{M) = ± i (106 

v ; 2903040 

M(M + 1) (9M 4 - 18M 3 - 57M 2 + 34Af + 80) 

P M - 7 (M) = — i-i — i. (107 

v ' 5806080 v ; 



Pm-s(M) 



P M -9{M) 



Pm-w(M) 



(135M 7 - 315M 6 - 1575M 5 + 735M 4 + 5320M 3 ) 
1393459200 
(2820A/ 2 - 1936M - 1152) 
H 1393459200 

M(M + 1) (15M 6 - 75Af 5 - 135M 4 + 527M 3 + 768M 2 ) 

2786918400 
M(M + 1) (668M + 1008) 
2786918400 

(99A/ 9 - 594M 8 - 1386M 7 + 6468M 6 + 14091M 5 ) 
367873228800 
(12826M 4 + 44132M 3 + 18392A/ 2 - 14432M - 7680) 
367873228800 



(108) 



(109) 



(110) 



PM-j(M) = + 0(Mi- a )) (111) 

Here we attach a piece of code in the symbolic computation language Mathematica. This code 
solves the recursion relations (O and verifies the result J?} for the inverse of the Vandcrmondc 
matrix. 

M =.;MMAX = 20;MyP[0, M_] = 1; 

id = QpenWrite ["Polynomials . dat"] ; 

Set0ptions[id, Format Type -> TeXForm] ; 

Do[ 

MyP[j, MJ = Fact or [-Sum [( Sum [ MyP[j - p, 1] (-l)--Op, {p, 1, j}]), {1, 1, M}]]; 
Write[id, "P(M-", j, ",M) = ", MyP[j, 1] /. {1 :> M>, FormatType -> TeXForm]; 

Print ["P CM-", j, ",M) = ", Simplify [(MyP[j , 1] /. {1 :> M}) /Product [CM - qq) , {qq, -1, j - 1}]]]: 
, {j , 1 , MMAX}] ; 

Close [id] ;M =. ; 
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MyPP[j_, M_] := MyP [j , 1] /. {1 :> M}; 

Mylnverse[j_, n_, M_] := (-l)"OCj + n)/((n - 1) ! (M - n)!)Sum[ MyPP[j - p, M](-n)"Op, {p, 0, j}] ; 

MyVandemonde [M_] := Table [ j~-Q(n - 1), {j , 1, M}, {n, 1, M>] ; 

M = MMAX ; MatrixForm [MyVandemonde [M] ] 

MM = Inverse [MyVandemonde [M] ] ; 

MatrixForm [MM] 

Do[ 

Print ["Checking M-", j, "th row: ", Table [My Inverse [j , n, M] , {n, 1, M}] - MM [ [M - j]]]; 
, {j , 1, MMAX - 1}] ; 



Appendix D 

In this appendix, we present a starting point for determining both necessary and sufficient 
conditions for the divergent behaviour discussed in the paper. 

We solve the system of equations l|40j l by Gaussian elimination. We eliminate the (M — p) th 

n 

row, for p = 1, . . . , M — s — 1, apply the identity x n — y n = (x — y) ^2 x n ~ t y t , divide the 

t=l 

equation by (n — M -f p — 1) and easily arrive at the following result: 

/ M \ 

I (-i) M , EHM-ib-A-i 

M 

o,(-i) M , E (-i) p JVsrj£- 3 (p-i-i), •• 

M 

o, o, (-i) A ', E (-ipp P s£f_ 3 (p-j-2), .. 



M 

0, 0, 0, (-1) A/ , E (-irP P S^(p-j-M + s + l), .. 

\ p=j+M-s-l ) 

(112) 

The rows correspond to n = M, ... ,s + 1 (from top to bottom) and the columns correspond 
to j = M, . . . , 1 (from left to right). In the bottom row there are (M — s — 1) zeros in columns 
M, M — 1, . . . , s + 2. For brevity we have dropped the argument M in the P P (M) polynomials, 
i.e. we have P p := P P (M). The coefficients S^ 1 (x) read: 

M M 

s. M (x)= £ II '/" >J ./> ' i (us) 

M—s q— s+l j— 3 + 1 

9=1 

= 7W I r77 E (-l) 9 C 9 M - s - 1 ( S + g + ir+ A/ - 3 - 1 (114) 

X— 1 \ / X \ 

n (m - « + g ) hr m— «o;(«) (us) 

9 = / \q=0 I 
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if s = M 
if s = M - 1 
if s = M- 2 (116) 
if s = M - 3 

i (M*+ s - 3(M - l)* 4 " 3 + 3(M - 2) a: + 3 - (M - S)* 4 - 3 ) if s = M - 4 



1 

M x 

M x+1 — (M — l) x+1 
i (M*+ 2 - 2(M - l) x + 2 + (M - 2) x + 2 ) 



The quantities Q x (s) are polynomials of order x\ they satisfy the following recursion relations: 



0<ij2 <9i <g 



1 

1 l + s 

2 2 

1 6a+7 3s 2 + 5s+2 

8 24 2 24 3 2 

J_ 3s+4 3s 2 +6s+3 s J +2s 2 +s 

48 48 48 48 



(117) 

for x = 1,2,3 (row-wise) and q = 0,...,x (column-wise) subject to Q[j(s) = 1. Here the 
coefficients A x (s) are defined in 1751 . The result l|117ll follows from inserting l|115| l into the 
second equality on the right hand side in (1 1 1 3 1 > and performing the sum over j using 1731 . 
From l(TT2l we see that the solution to the system of equations l|40|l depends on s pa- 

rameters. We choose the first s reduced probabilities y-j — J as the parameters. In each 

equation in 1401 1 we move the last s terms onto the right-hand side and we obtain a system 
of M — s equations with an upper triangular quadratic matrix. We eliminate the consecutive 
variables and obtain the following solution: 



= E 



E (-i) a+1 

?=0 



E^ 



9-1 

n Uiji+i) 
i=~i 

3-i>jo>...>js-i>»+i n Ui,h) 



E 



j = s+l 



(118) 
(119) 



where 

M 

(i,J):= E {-WPpSfUip-j + i-M) (120) 

p— M — i 

for j < i and i = M, . . . , s + 1 and subject to j—i = j and jq = q The step 11181 1^ 11191 follows 
from the fact that (i, i) = ( — 1) M . Here we denoted: 



q=0 3-l>J0>-->W-l>s+li=-l 



j = s + l 



The sum in 11211 1 contains 2 J 9 terms. 

Now we compute the matrix elements in 11201 for big values of M. We have: 



(121) 



«,•)= f c ir (M+1)! ( MJW 



°(M -p)! 



0(M 



M-p-2 



(P-J)l 

(M-i)! 

M 

E M) p 

p=M+j-i 



p — j + i — M 

^ M p-j + i-M-q^ 

9=0 



p—j-\-i — M 



(2(p-j + i-M))\\ 



)) x 



(i_l)9 +0 (j9-i)) 



(M + l)! M _1_J+i / (p-j)\ 



o\ 2 M ~v{M -p)\ \{M - i)\ 



(122) 
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(M + 1)M 



-l-j+i 



E 



Ml 



p\(M -p)\ 



(M — i)\(p — j + i — M)\ 



_(_l)^-*+i^ ( _l)P C| 



p=0 



M-i+j 



(123) 

(124) 
(125) 

I rlV+ j + m 1 " * + " i (1 + 7) I if HmM-,00 £ = 7 ( > ^ 



2~i+i 



M-i+; 



i—3—V M—i 



v Af ' 



y^yi — 1 



if IimM-nxj -57 = 



In d 1 22 1 ) we used (1115ft along with 11171 to transform the quantities Sf^_ 1 (p — j + i — M) and 
12J along with (1 1 1 It to transform the quantities P P (M). In 11231 1 we neglected the lower order 
powers of M, we simplified the expression, and we performed the sum over q = 0, ■ . ■ ,p — 
j + i — M using the binomial expansion formula. In 11241 1 we simplified the expression. In 
11251 we substituted for p — M — j + % and simplified the expression. Finally in 11251 we 
performed the sum over p in the case of big values of i and we expressed that sum through the 
hypergeometric function otherwise. Since the latter sum is a hypergeometric sum and since 
necessary and sufficient conditions for such sums to be expressed in closed form are known 
(see 1211 and references therein) it would be interesting to check if a closed form expression 
can be also found in the latter case. 

Now we compute the quantities in 11211 1 in the large-M limit. We insert the top result 
in 11261 into 11211 1 and we do the sums over the ordered sequences by using identity 1701 1 for 
Pi = s — q and p2 = 0. The final result reads 2J 9 = (27 9 ,j) A 



where: 



9Ja 



{-—y~ q 

y 2 ' 



where 



v 



! 1 =l....,s-q + l 
l 2 =l,...,j-s+2 



E£»gi+l£,(j-s— 9i)A(s— q+1) 

0<ff I <j-s-l 



(127) 
(128) 
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